Finite-state transducer based hungarian LVCSR with explicit modeling of phonological changes

نویسندگان

Máté Szarvas

Sadaoki Furui

چکیده

This article describes the design and the experimental evaluation of the first Hungarian large vocabulary continuous speech recognition (LVCSR) system. The architecture of the recognition system is based on the recently proposed weighted finite state transducer (WFST) paradigm. The task domain is the recognition of fluently read sentences selected from a major daily newspaper. Recognition performance is evaluated using both monophone and triphone gender independent acoustic models. The vocabulary units used in the system are morpheme based in order to provide sufficient coverage of the large number of word-forms resulting from affixation and compounding in Hungarian. The language model is a statistical morpheme bigram model. Besides the basic list style pronunciation dictionary model we evaluate a novel phonology modeling component that describes the phonological changes prevalent in fluent Hungarian. Thanks to the flexible transducerbased architecture of the system the phonological component is integrated seamlessly with the basic modules with no need to modify the decoder itself. The proposed phonological model decreases the error rate by 8.32% relatively compared to the baseline triphone system. The morpheme error rate of the best configuration is 17.74% in a 1200 morpheme task with test set perplexity 70.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Explicit Modeling of Phonological Changes in Finite-state Transducer Based Hungarian Lvcsr

This article describes the operation and the experimental evaluation of the pronunciation modeling component of the first Hungarian large vocabulary continuous speech recognition system. The proposed method is based on the implementation of context dependent rewrite rules by weighted finite state transducers (WFSTs). The proposed phonological model decreases the error rate by 8.32% relatively c...

متن کامل

Finite-state Transducer Base with Explicit Modeling of Ph

متن کامل

Finite-state Transducer Based Phonology and Morphology Modeling with Applications to Hungarian Lvcsr

This article introduces a novel approach to model phonology and morphosyntax in morpheme unit based speech recognizers. The proposed method is evaluated in our recent Hungarian large vocabulary continuous speech recognition (LVCSR) system. The architecture of the recognition system is based on the weighted finite state transducer (WFST) paradigm. The task domain is the recognition of fluently r...

متن کامل

Finite-state transducer based modeling of morphosyntax with applications to Hungarian LVCSR

This article introduces a novel approach to model morphosyntax in morpheme unit based speech recognizers. The proposed method is evaluated in our recent Hungarian large vocabulary continuous speech recognition (LVCSR) system. The architecture of the recognition system is based on the weighted finite state transducer (WFST) paradigm. The task domain is the recognition of fluently read sentences ...

متن کامل

Modeling Morphosyntax with Finite-state Transducers and Its Application to Hungarian Lvcsr

Large vocabulary speech recognition systems for several languages have to use morphemes as the basic recognition units. Such systems are frequently suffering from the over-generation property of the smoothed N -gram language model. The source of the problem is that most of the function-morphemes are very short and their unigram likelihood is high. These morphemes are inserted frequently in the ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2002

Finite-state transducer based hungarian LVCSR with explicit modeling of phonological changes

نویسندگان

چکیده

منابع مشابه

Explicit Modeling of Phonological Changes in Finite-state Transducer Based Hungarian Lvcsr

Finite-state Transducer Base with Explicit Modeling of Ph

Finite-state Transducer Based Phonology and Morphology Modeling with Applications to Hungarian Lvcsr

Finite-state transducer based modeling of morphosyntax with applications to Hungarian LVCSR

Modeling Morphosyntax with Finite-state Transducers and Its Application to Hungarian Lvcsr

عنوان ژورنال:

اشتراک گذاری